Automatic Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet
نویسندگان
چکیده
Lexical resources such as WordNet and the EDR electronic dictionary (EDR) have been used in several NLP tasks. Probably partly due to the fact that the EDR is not freely available, WordNet has been used far more often than the EDR. We have used both resources on the same task in order to make a comparison possible. The task is automatic assignment of keywords to multi-party dialogue episodes (i.e. thematically coherent stretches of spoken text). We show that the use of lexical resources in such a task results in slightly higher performances than the use of a purely statistically based method.
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملThe Development of Lexical Resources for Information Extraction from Text Combining WordNet and Dewey Decimal Classification
Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources (e.g., lexical databases) are promising for reducing the cost of specific lexica definition, but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguit...
متن کاملExtracting Lexico-conceptual Knowledge for Developing Persian WordNet
Semantic lexicons and lexical ontologies are some major resources in natural language processing. Developing such resources are time consuming tasks for which some automatic methods are proposed. This paper describes some methods used in semi-automatic development of FarsNet; a lexical ontology for the Persian language. FarsNet includes the Persian WordNet with more than 10000 synsets of nouns,...
متن کاملJapanese Hyponymy Extraction based on a Term Similarity Graph
Semantic relations between words, such as hyponymy, synonymy and meronymy, have various information access applications (e.g. Web search) and the automatic extraction of such relations from corpora is an important research problem in natural language processing. For the Japanese language, there exist several linguistic resources that contain these relations, such as the Japanese Wordnet, Nihong...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cs.CL/0410062 شماره
صفحات -
تاریخ انتشار 2004